Bayesian Hierarchical Mixtures of Experts

نویسندگان

  • Christopher M. Bishop
  • Markus Svensén
چکیده

The Hierarchical Mixture of Experts (HME) is a well-known tree-structured model for regression and classification, based on soft probabilistic splits of the input space. In its original formulation its parameters are determined by maximum likelihood, which is prone to severe overfitting, including singularities in the likelihood function. Furthermore the maximum likelihood framework offers no natural metric for optimizing the complexity and structure of the tree. Previous attempts to provide a Bayesian treatment of the HME model have relied either on local Gaussian representations based on the Laplace approximation, or have modified the model so that it represents the joint distribution of both input and output variables, which can be wasteful of resources if the goal is prediction. In this paper we describe a fully Bayesian treatment of the original HME model based on variational inference. By combining ‘local’ and ‘global’ variational methods we obtain a rigorous lower bound on the marginal probability of the data under the model. This bound is optimized during the training phase, and its resulting value can be used for model order selection. We present results using this approach for data sets describing robot arm kinematics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Mixtures of Naive Bayesian Classifiers

Naive Bayesian classifiers tend to perform very well on a large number of problem domains, although their representation power is quite limited compared to more sophisticated machine learning algorithms. In this paper we study combining multiple naive Bayesian classifiers by using the hierarchical mixtures of experts system. This novel system, which we call hierarchical mixtures of naive Bayesi...

متن کامل

Bayesian Inference in Mixtures-of-Experts and Hierarchical Mixtures-of-Experts Models With an Application to Speech Recognition

Machine classi cation of acoustic waveforms as speech events is often di cult due to context-dependencies. A vowel recognition task with multiple speakers is studied in this paper via the use of a class of modular and hierarchical systems referred to as mixtures-of-experts and hierarchical mixtures-of-experts models. The statistical model underlying the systems is a mixture model in which both ...

متن کامل

Adversarial Learning with Bayesian Hierarchical Mixtures of Experts

Many data mining applications operate in adversarial environment, for example, webpage ranking in the presence of web spam. A growing number of adversarial data mining techniques are recently developed, providing robust solutions under specific defense-attack models. Existing techniques are tied to distributional assumptions geared towards minimizing the undesirable impact of given attack model...

متن کامل

Bayesian Rose Trees

Hierarchical structure is ubiquitous in data across many domains. There are many hierarchical clustering methods, frequently used by domain experts, which strive to discover this structure. However, most of these methods limit discoverable hierarchies to those with binary branching structure. This limitation, while computationally convenient, is often undesirable. In this paper we explore a Bay...

متن کامل

Bayesian Normalized Gaussian Network and Hierarchical Model Selection Method

This paper presents a variational Bayes (VB) method for normalized Gaussian network, which is a mixture model of local experts. Based on the Bayesian framework, we introduce a meta-learning mechanism to optimize the prior distribution and the model structure. In order to search for the optimal model structure efficiently, we also develop a hierarchical model selection method. The performance of...

متن کامل

Investigation of Bayesian Mixtures-of-Experts models to predict semiconductor lifetime

Investigating the reliability of a semiconductor device is time and cost consuming, but essential for industry and customers. To save resources, models that predict the lifetime and the valid parameter range dependent on the stress conditions are needed. The given semiconductor lifetime data show a mixture of two lognormal distributions [1], where the mixture weights of the two components depen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003